ptairMS 0.0.0
The ptairMS package provides a workflow to process PTR-TOF-MS raw data in the open Hierarchical Data Format 5 (HDF5; .h5 extension), and generate the peak table as an ExpressionSet object for subsequent data analysis with the many methods and packages available in R. Applications include the analysis of exhaled air, headspace or ambient air. The package offers several features to check the raw data and tune the few processing parameters. It also enables to include new samples in a study without re-processing all the previous data, providing a convenient management for cohort studies.
Proton Transfer Reaction - Mass Spectrometry (PTR-MS) has emerged with excellent sensitivity and specificity for VOC analysis in a wide range of applications, including environment, food quality, and biology (Blake, Monks, and Ellis 2009). In the area of health and care, PTR-MS opens up unique opportunities for real-time analysis “at the patient’s bedside”.
The three described approaches for the preprocessing of PTR-TOF-MS data to date, i.e. PTR-MS Viewer (Ionicon; commercial), PTR-TOF Data Analyzer (Müller et al. 2013), and PTRwid (Holzinger 2015), do not address the untargeted analysis of exhaled breath because i) they are either limited to targeted analysis or require very long computation time, ii) the have been developed for continuous environmental monitoring and do not take into account the time dimension, and iii) the algorithms cannot be accessed, nor modified, and such modules cannot be integrated into comprehensive bioinformatics workflows.
Characterization of volatile organic compounds (VOCs) emitted by living organisms is of major interest in medicine, food sciences, and ecology. As an example, thousands of VOCs have been identified in the exhaled breath, resulting from normal metabolism or pathological processes (Lacy Costello et al. 2014). The main advantage of breath analysis in medicine is that the sampling is non-invasive (Devillier et al. 2017). Methods based on mass spectrometry (MS) are the reference technologies for VOC analysis because of their sensitivity and large dynamic range.
First, a ptrSet object is generated with the function createPtrSet from the directory containing the raw files, possibly grouped into subfolders according to classes of samples. The object will contain all information about the study: the sample metadata, one peak list for each samples, and several quality metrics about the files. Second, the peak lists are aligned between samples and an ExpressionSet (Biobase package object) object is returned , containing the peak table, sample metadata, and feature metadata. Missing values in the dataMatrix can be replaced by the integrated signal in the expected raw data region. The features can be annotated with the annotateVOC function, based on the database described in (Lacy Costello et al. 2014).
ptairData packageTwo real PTR-TOF-MS raw data sets are available on the ptairData package.
exhaled air of two healthy individal: three acquisitions per individual on diffrents day with two exprirations
mycobateria culture air analysis: two replicates of two species and control (culture medium)
For memory reasons, the raw data are truncate in mass axis at the range: \([20.4,21.6] U [50 , 150]\) for individual, and $ [20.4 ; 21.6] U [56.4 ; 90.6]$ for mycobacteria.
library(ptairMS)
library(ptairData)
Start the analysis from a raw data directory. For this example, we use one of the ptairData package:
dirRaw <- system.file("extdata/exhaledAir", package = "ptairData")
Before processing data, there is two important steps:
calibrate the mass axis
determine the interest acquisition time periods (expirations or head space analysis for our examples).
To perform and check the quality of this steps, we first use the createPtrSet function, that will indivadually for each file read the raw data, calibrate the mass axis with mzCalibRef parameter, determine expirations or head space analysis limits that depend on fracMaxTIC parameter (see Determine expirations or head spaces analysis limits section), and provide a default sampleMetadata based on the tree structure of the directory.
This function creates a ptrSet object that contains all necessary information about the files in the study directory: calibration prameters, time limits, peak lists (as this step it is empty), and primary ion quanitfication. It can be update if new files are added are deleted from the directory with the function updatePtrSet (see update section).
To create a ptrSet object form the raw data directory:
exhaledSet <- createPtrSet(dirRaw,
setName="exhaledSet",
mzCalibRef = c(21.022, 60.0525),
fracMaxTIC = 0.7,
saveDir = NULL )
## ind1-1.h5 check
## ind1-2.h5 check
## ind1-3.h5 check
## ind2-1.h5 check
## ind2-2.h5 check
## ind2-3.h5 check
exhaledSet
## ptrSet object : exhaledSet
## directory: C:/Users/CR258086/Documents/R/R-3.6.1/library/ptairData/extdata/exhaledAir
## 6 files contains in the directory
## 6 files check
## 0 files processed by detectPeak
To get the list of file names in your directory (at the last time the ptrSet was created or updated), use :
getFileNames(exhaledSet)
## [1] "ind1-1.h5" "ind1-2.h5" "ind1-3.h5" "ind2-1.h5" "ind2-2.h5" "ind2-3.h5"
We will now quickly explain each steps of this function, and show how to choose and check the quality of mzCalibRef and fracMaxTIC parameters.
To convert Time Of Flight (TOF) axis to mass/charge ratio (mz) axis, the following formula is used (Müller et al. (2013)):
To estimate parameters (a , b), reference peaks with known masses (and without overlapping peaks) are needed. For optimal results the mass axis calibration peaks have to be well distributed over the entire m/z. Set those masses in the mzCalibRef parameter. For exhaled air, we propose for addition masses of ionicon:
calib_table<-read.csv(system.file("extdata", "reference_tables/calib_table.tsv", package = "ptairMS"),sep="\t")
knitr::kable(calib_table)
| Formula | mz | compound |
|---|---|---|
| H3 (18O)+ | 21.02204 | isotope of primary ion (H30+) |
| N2H+ | 29.01342 | Diazote |
| C3H5+ | 41.03858 | |
| C3H7O+ | 60.05250 | isotope of Acetone |
| NA | 203.94300 | Ionicon Calibration mass 1 |
| NA | 330.84950 | ionicon Calibration mass 2 |
| Other refere | nce masses ( | at least two) can be provided by the user. |
To check the calibration accurancy, you can use the plot method of ptrSet, that provides trhee plot calibError the calibration error boxplot in ppm, an estimation of the resolution \(\frac{m}{\Delta_m}\) for the mzCalibRef peaks and average normalized peak shape of the mzCalibRef.
plot(exhaledSet)
If a masses contains one or several files outlier (error > 20 ppm), you can check those files with the function plotCalib, that plot for selected files the total ion spectrum around all the mzCalibRef (with the exact masses as red vertical lines).
plotCalib(exhaledSet,fileNames=getFileNames(exhaledSet)[5])
You can then choose to keep or delete this masses from mzCalibRef. To change it, use calibration function on the ptrSte object.
Since we have truncate the mass axis, we can’t use masses all the masses listed above. To calibrate with at least three masses, we add the peak 75.04406 ([C3H6O2+H]+,Hydroxyacetone).
exhaledSet <- calibration(exhaledSet, mzCalibRef = c(21.022, 60.0525,75.04406))
It will return the same object, with the calibration parameters updated.
plot(exhaledSet,type="calibError")
The expirations or head space analysis are determined on the Total Ion Chromatogram. There correspond to the part of the TIC where intensity is higher than fracMaxTIC * max(TIC) with baseline removal. To analyze the entire spectrum (ambiant air), set fracMaxTIC to 0.
Example of expirations detetcion at fracMaxTIC = 0.5
Example of expirations detetcion at fracMaxTIC = 0.9
To see the expiration limits on all files in the ptrSet use plotTIC function:
plotTIC: This function plot the Total Ion Spectrum for several files. By default, when fileNames is NULL, all TICs files are plotted. A pdf file is generated with the pdfFile argument by give an absolute file path with the pdf extension. Finally, you can remove the baseline by set baselineRm = TRUE, and add the time limits to the plot by set showLimits = TRUE. A coloration by a column of the sampleMetadat is also posible by set the column name in colorBy parameter.plotTIC(object = exhaledSet, showLimits = TRUE, pdfFile="AllTIC.pdf")
plotTIC(object = exhaledSet, showLimits = TRUE,fileNames=getFileNames(exhaledSet)[1],type = "ggplot")
To change the fracMaxTIC paramter, use timeLimits function :
exhaledSet<-timeLimits(exhaledSet,fracMaxTIC = 0.8)
An interface to manually change the limit of expiration is in progress.
The createPtrSet function automatically generrates a default sampleMetadata ‘data.frame’. It contains file names in row names, and a column named ‘subfolder’ when the files are organized into subfolders in the parent directory.=. To get this data.frame, use the getSampleMetadata method. Row names of the sampleMetadata must always contains all file names from the directory.
getSampleMetadata(exhaledSet)
## subfolder
## ind1-1.h5 ind1
## ind1-2.h5 ind1
## ind1-3.h5 ind1
## ind2-1.h5 ind2
## ind2-2.h5 ind2
## ind2-3.h5 ind2
You can at any moment obtain this default sample metada data with the function resetSampleMetadata(exhaledSet)
To modify the sample metadata, there exist two diffrent ways:
setSampleMetadatamethod: modify the data.frame and put it into the ptrSet object.sampleMD <- getSampleMetadata(exhaledSet)
colnames(sampleMD)[1] <- "individual"
exhaledSet <- setSampleMetadata(exhaledSet,sampleMD)
getSampleMetadata(exhaledSet)
## individual
## ind1-1.h5 ind1
## ind1-2.h5 ind1
## ind1-3.h5 ind1
## ind2-1.h5 ind2
## ind2-2.h5 ind2
## ind2-3.h5 ind2
exportSampleMetada and importSampleMetadata methods:exportSampleMetada save the data.frame in .tsv format in the directory of your choice. Parameter saveFile must be in .tsv extension. You can open and modify it on any spreadsheet editor, but never change the row names: if is missing one file in row names, there will an error. Once the .tsv file has been modified, it can be imported back into the ptrSet object, with the function importSampleMetadata. It return the same ptrSet object, but with with the updated sampleMetadata.exportSampleMetada(exhaledSet, saveFile = paste(DirBacteria,"sampleMetadata.tsv",sep="/"))
exhaledSet <- importSampleMetadata(exhaledSet, file = paste(DirBacteria,"sampleMetadata.tsv",sep="/"))
Finally, the function saves the object as .Rdata in saveDir directory if it is not NULL, with setName as name. Then, to import the saved ptrSet object, use load base function.
There exist two functions for plotting the raw data:
plotRaw: Displays for a file the image of the matrix of intensities, the EIC, the sum spectrum, for the selected m/z and time ranges and annotation of known VOC possibly contains in the mz range (just for infomation, no peak detection have been done as this step).plotRaw(exhaledSet, mzRange = 59 , fileNames = getFileNames(exhaledSet)[1],showVocDB = T)
## mz_Hplus formula_Hplus cas.name
## 33 59.08552 [C4H10+H]+ 2-methylpropane|n-butane
## 32 59.04914 [C3H6O+H]+ acetone|2-propen-1-ol|propanal
plotFeatures: for all selected files, the average spectrum for all time periods is plotted, with the background superimposed as a dashed line. As plotTIC function, you can choose type plotly or ggplot, and coloring spectrum by a column name of the sample MetadataplotFeatures(exhaledSet,mz=21.022)
plotFeatures(exhaledSet,mz=59.049,type="ggplot",colorBy = "individual")
If you want to delete or added files to the directory after created the ptrSet, modify in the directory and run updatePtrSet function.
exhaledSet <- updatePtrSet(exhaledSet)
## nothing to update
Now we checked if the calibration is efficient, and if the expiration or head space are well determining. The, we can process all files with the detectPeak function.
For each file in the data set, this function :
calibrates each expiration and ambiant air
detects peak in each expiration and ambiant air (see annex 1 for peak detection algorithm)
aligns features between the background and diffrents analyticaltime periods (i.e. headspace or expirations)
keeps features present in at leat fracGroup % of expirations (if exhaled air)
if normalize=TRUE, normalizes with the primary ion (devided all features by the quantity of H3(O18)+ *488), and converts in ppb if the ptr reaction information are available in the h5 file (voltage, temperature and pressure of the drift)
The peakList is then written in the ptrSet object: a list of data.table, which contains for each file all peaks detected, with quantification of analytical time periods and background (ambient air).
exhaledSet <- detectPeak(exhaledSet , mzNominal = NULL,fracGroup=1,normalize = T)
## ind1-1.h5 : 133 peaks detected
## ind1-2.h5 : 139 peaks detected
## ind1-3.h5 : 126 peaks detected
## ind2-1.h5 : 137 peaks detected
## ind2-2.h5 : 145 peaks detected
## ind2-3.h5 : 136 peaks detected
If you want to restrinct the detection to few nominal mass, set them in mzNominal, for example exhaledSet <- detectPeak(exhaledSet , mzNominal = c(21,59)).
This step can take few times if there are many files and a large mz range. You can then parallelize the algorithm by set parallel = TRUE and give the number of cores in nbCores (the number of files that will be processed simultaneous). To see the number of CPU cores available in your computer, use parallel::detectCores().
To see the resulting peak lists, use getPeakList method. It return a list with two elements:
raw: contains the peak List obtained of each expiration and background contatened, with the following column :
aligned: contains the peak list obtained after alignment beteween background and time periods:
if fracMaxTIC is set to zero at the begining, ranw and aligned peakList are the same
peakList<-getPeakList(exhaledSet)
head(peakList$aligned$`individual4-1.h5`)
## NULL
ptrSet peakListThe ptairMs package allow to update the ptrSet object linked to a directory, as the data is acquired. When you acquire new data in the directory, first apply updatePtrSet function to the object, reset the sample metadata or complet it, and then apply detectPeak function.
exhaledSet<-updatePtrSet(exhaledSet)
exhaledSet<-setSampleMetadata(exhaledSet,resetSampleMetadata(exhaledSet))
exhaledSet<-detectPeak(exhaledSet)
The alignment between samples (i.e. matching of variables between the peak lists of the samples) is performed by using a kernel gaussian density (Delabrière et al. 2017). At this stage, the table of peak intensities has been built, and will be handled together with the sample meta data (from ptrSet) and variable meta data containing the background peak table, as an ExpressionSet object.
It is possible to apply two filters:
On the background: only variables with intensities greater than bgThreshold * background are kept
On samples reproducibility: only variables which are detected in more than \(fracGroup\) % of samples are kept.
If you do not want to apply those filters, set fracGroup and bgThreshold to 0.
eSet <- alignSamples(exhaledSet, group="individual", fracGroup = 1, bgThreshold = 4 )
## 31 peaks aligned
To see those three tables contains in the ExpressionSet:
knitr::kable(head(Biobase::exprs(eSet)))
| ind1-1.h5 | ind1-2.h5 | ind1-3.h5 | ind2-1.h5 | ind2-2.h5 | ind2-3.h5 | |
|---|---|---|---|---|---|---|
| 51.0442 | 2.9207285 | 2.8951917 | 2.4713660 | 8.839019 | 6.8739324 | 11.2131810 |
| 55.0389 | 45.4250743 | 29.9315739 | 34.9467164 | 53.857206 | 53.8403746 | 40.9425155 |
| 57.033 | 2.5827417 | 2.6633915 | 2.0290540 | 2.650103 | 6.6955009 | 2.5530096 |
| 58.0401 | 0.5620376 | 0.7557753 | 0.5553208 | 0.559625 | 0.5495952 | 0.6863493 |
| 59.0498 | 977.3908299 | 1461.1439782 | 1105.9457217 | 965.571330 | 746.9824237 | 1365.1584509 |
| 60.0526 | 28.7531310 | 41.2103019 | 30.9498646 | 30.028931 | 22.5069782 | 40.2432164 |
knitr::kable(head(Biobase::fData(eSet)))
| Background - ind1-1.h5 | Background - ind1-2.h5 | Background - ind1-3.h5 | Background - ind2-1.h5 | Background - ind2-2.h5 | Background - ind2-3.h5 | |
|---|---|---|---|---|---|---|
| 51.0442 | 0.0553349 | 0.0869217 | NaN | NaN | 0.0525769 | 0.0522817 |
| 55.0389 | 0.7546768 | 0.9956533 | 0.6909736 | 0.6928665 | 0.5618447 | 0.6924735 |
| 57.033 | 1.1260609 | 1.4269247 | 0.8211809 | 0.8360225 | 1.1612819 | 1.1188522 |
| 58.0401 | 0.0751487 | 0.1386051 | 0.0777504 | 0.0782107 | 0.1064082 | 0.0915683 |
| 59.0498 | 24.6982617 | 51.6309914 | 17.9742560 | 19.1219327 | 36.5289178 | 29.5324392 |
| 60.0526 | 0.9117387 | 1.7352323 | 0.7091926 | 0.7350493 | 1.2895062 | 1.1039797 |
knitr::kable(head(Biobase::pData(eSet)))
| individual | |
|---|---|
| ind1-1.h5 | ind1 |
| ind1-2.h5 | ind1 |
| ind1-3.h5 | ind1 |
| ind2-1.h5 | ind2 |
| ind2-2.h5 | ind2 |
| ind2-3.h5 | ind2 |
To imputue missing values, ptairMS returns back to the raw data, and compute with the same method as detectPeak the missing values without limit of detection.
eSet <- ptairMS::impute(eSet, exhaledSet)
ptairMS propose a putative annotation based on the database described in (Lacy Costello et al. 2014),with the annotateVOC function. Applied to an expression set, it write in the features metadata all possible formulas and names found.
annotateVOC(59.049)
## vocDB_mz_Hplus vocDB_formula_Hplus vocDB_cas.name
## 59.049 59.04914 [C3H6O+H]+ 2-propen-1-ol, acetone, propanal
eSet<-annotateVOC(eSet)
knitr::kable(head(Biobase::fData(eSet)))
| Background - ind1-1.h5 | Background - ind1-2.h5 | Background - ind1-3.h5 | Background - ind2-1.h5 | Background - ind2-2.h5 | Background - ind2-3.h5 | vocDB_mz_Hplus | vocDB_formula_Hplus | vocDB_cas.name | |
|---|---|---|---|---|---|---|---|---|---|
| 51.0442 | 0.0553349 | 0.0869217 | NaN | NaN | 0.0525769 | 0.0522817 | |||
| 55.0389 | 0.7546768 | 0.9956533 | 0.6909736 | 0.6928665 | 0.5618447 | 0.6924735 | |||
| 57.033 | 1.1260609 | 1.4269247 | 0.8211809 | 0.8360225 | 1.1612819 | 1.1188522 | 57.03349 | [C3H4O+H]+ | 2-propenal |
| 58.0401 | 0.0751487 | 0.1386051 | 0.0777504 | 0.0782107 | 0.1064082 | 0.0915683 | |||
| 59.0498 | 24.6982617 | 51.6309914 | 17.9742560 | 19.1219327 | 36.5289178 | 29.5324392 | 59.04914 | [C3H6O+H]+ | 2-propen-1-ol, acetone, propanal |
| 60.0526 | 0.9117387 | 1.7352323 | 0.7091926 | 0.7350493 | 1.2895062 | 1.1039797 |
Then you can export the expressionSet with the function writeEset into 3 tabulated files ‘dataMatrix.tsv’, sampleMetadata.tsv’, and ‘variableMetadata.tsv’:
writeEset(eset, dirC = file.path(getwd(), "processed_dataset"))
You can now use known package as ropls to performs PCA or further statistical analysis.
X<-Biobase::exprs(eSet)
SMD<-Biobase::pData(eSet)
features<-Biobase::fData(eSet)
ropls::view(log2(X),printL = FALSE)
pca<-ropls::opls(t(log2(X)),crossvalI=5,info.txtC="none",fig.pdfC="none")
plot(pca,type="overview",parAsColFcVn=SMD$individual)
plot(pca,type="x-score",parAsColFcVn=SMD$individual)
plot(pca,type="x-loading")
dim1<-ropls::getLoadingMN(pca)[,1]
barplot(sort(abs(dim1),decreasing = T))
knitr::kable(features[names(sort(abs(dim1),decreasing = T)),c("vocDB_formula_Hplus","vocDB_cas.name")])
| vocDB_formula_Hplus | vocDB_cas.name | |
|---|---|---|
| 67.0538 | [C5H6+H]+ | 1,3-cyclopentadiene |
| 70.0732 | ||
| 69.0699 | [C5H8+H]+, [H8C5+H]+ | (E)-, 1,3-pentadiene, 3-methyl-1-butyne, cyclopentene, isoprene, pentyne ui |
| 68.0598 | ||
| 109.0663 | [C7H8O+H]+ | 1-hydroxy-3-methylbenzene, 4-methylphenol , benzyl alcohol, p-cresol |
| 138.1339 | ||
| 51.0442 | ||
| 137.131 | [C10H16+H]+ | 1-methyl-4-(1-methylethyl)-1,3-cyclohexadiene, 1-methyl-4-(1-methylethyl)-1,4-cyclohexadiene, 7-methyl-3-methylene-1,6-octadieneFBrSa, 1-methyl-4-(1-methylethylidene)-cyclohexene, 2-methyl-5-(1-methylethyl)-bicyclo[3.1.0]hex-2-ene, 3-thujene , 3,7-dimethyl-1,3,6-octatriene, 3,7,7-trimethyl-bicyclo[4.1.0]hept-3-ene, 4-methylene-1-(1-methylethyl)-bicyclo[3.1.0]hexane, alpha-pinene, alpha-terpinene, beta-myrcene, beta-phellandrene, beta-pinene, beta-terpinene, camphene, cis-beta-ocimene, d-limonene, delta terpinene, dl-limonene, gamma-terpinene, terpinolene , trans-beta-ocimene, tricyclene |
| 81.0694 | [C6H8+H]+ | 1,3-cyclohexadiene, 1,4-cyclohexadiene |
| 55.0389 | ||
| 82.0721 | ||
| 95.0851 | ||
| 96.0872 | ||
| 57.033 | [C3H4O+H]+ | 2-propenal |
| 87.0801 | [C5H10O+H]+, [H10C5O+H]+ | 2-methylbutanal, 2-pentanone, 3-methyl-2-buten-1-ol, 3-methyl-3-buten-1-ol, 3-methylbutanal, 3-pentanone, E-2-penten-1-ol, methylbutanal, pentanal |
| 76.0476 | ||
| 71.0487 | [C4H6O+H]+ | 2-butenal , 2,3-dihydrofuran, 3-buten-2-one , crotonaldehyde, methyl vinyl ketone |
| 59.0498 | [C3H6O+H]+ | 2-propen-1-ol, acetone, propanal |
| 96.0532 | ||
| 95.0492 | [C6H6O+H]+ | phenol |
| 60.0526 | ||
| 65.0593 | ||
| 75.0438 | [C3H6O2+H]+ | 1-hydroxy-2-propanone, 1,3-dioxolan, acetic acid, ethyl ester , ethyl formate, methanoic acid, methyl ester, propanoic acid , propionic acid |
| 58.0401 | ||
| 89.0589 | [C4H8O2+H]+ | 1,4-diethylene dioxide, 2-methylpropanoic acid, acetic acid, butanoic acid , butyric acid, ethyl acetate, ethyl ester, methyl ester, propanoic acid |
| 66.0645 | ||
| 97.0278 | [C5H4O2+H]+, [H4C5O2+H]+ | 5-hydroxy-2,4-pentadienoicacid, 2-pyranone, 3-furaldehyde, delta-lactone, furaldehyde, furaldehyde ui, furfural |
| 63.0264 | [C2H6S+H]+ | dimethylsulfide , thiopropane |
| 77.0597 | [C3H8O2+H]+ | 1-hydroxy-2-methoxyethane, 1,2-propanediol, dimethoxymethane |
| 63.0073 | [CH2O3+H]+ | carbonic acid |
| 78.0626 |
To read and access to the entire raw data of one file, use readRaw function. It opens the data from a absolute file path whith rhdf5 library, and returns a ptrRaw object containing the raw data matrix, the time axis, the m/z axis obtained after calibration if calibTIS = TRUE, and information contained in the h5 file (transmission curve, drift temperature, …).
To get a absolute file path of one file:
samplePath <-list.files(dirRaw,recursive = T,full.names = T,pattern = ".h5$")[1]
To read the file :
sampleRaw <- readRaw(samplePath, calibTIS = F)
sampleRaw
## ind1-1.h5
## mz range: 20.4 - 150.7
## time range: 0 - 49
## Calibration error in ppm:
## No calibration performs
After that, you can use same method as ptrSet object: calibration, timeLimits, plotCalib, plotTIC, plotRaw, and detectPeak.
This research was developed within the Data Sciences for Deep Phenotyping and Precision Medicine team at CEA (C. Roquencourt, P. Zheng, P. Roger-Mele and E. Thévenot) in collaboration with the Exhalomics platform from the Foch Hospital (S. Grassin-Delyle, P. Devillier, H. Salvator, E. Naline and L.-J. Couderc) within the SoftwAiR project supported by the Agence Nationale de la Recherche (ANR-18-CE45-0017 grant).
For each nominal mass m/z:
Selection of the m/z range: [m/z - 0.6, m/z + 0.6] and definition of m/z windows for noise ([m/z - 0.6, m/z - 0.4] U [m/z + 0.4, m/z + 0.6]) and signal ([m/z - 0.4, m/z + 0.4])
Removing baseline (CLAYTON et al., n.d.)
Smoothing the signal with Savitzky Golay windows (Savitzky and Golay 1964) computed by using noise autocorrelation (Vivó-Truyols and Schoenmakers 2006)
Detection of peak maxima by using the derivatives of Savitzky Golay filter (all concave regions are determined with the second derivative, and the maxima correspond in each of those regions to the point with the first deriative closest to zero)
Thresholding to the noise maximum arround the mass
Iterrative fit function sech2: \(peak_i(h_i,m_i,\Delta_{1i},\Delta_{2i}) = {h_i}/{(cosh(\Delta_{1i}*(x-m_i))^2*(x<=m_i)+cosh(\Delta_{2i}*(x-m_i))^2*(x>m_i))}\), with initialisation and constains:
Quantification by summing the fitted peaks over a range of 10*\(\Delta_i\)
peakList<-PeakList(sampleRaw, mzNominal = c(63),
plotFinal=TRUE ,plotAll =TRUE)
peakList$peak
## mz quanti delta_mz resolution R2 parameter.1 parameter.2
## 1 63.00499 2078.309 0.01242852 5069.388 0.9992448 165.30514 124.1945
## 2 63.02435 3323.928 0.01148845 5485.889 0.9992448 158.90175 148.3347
## 3 63.04147 1042.109 0.01692215 3725.382 0.9992448 83.88503 137.3878
## parameter.3
## 1 264.75165
## 2 458.04107
## 3 97.51366
Here is the output of sessionInfo() on the system on which this document was compiled:
## R version 3.6.1 (2019-07-05)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
## [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
## [5] LC_TIME=French_France.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ptairData_0.1.0 ptairMS_0.0.0 data.table_1.12.8 BiocStyle_2.14.2
##
## loaded via a namespace (and not attached):
## [1] ProtGenerics_1.18.0 bit64_0.9-7 RColorBrewer_1.1-2
## [4] doParallel_1.0.15 httr_1.4.1 MSnbase_2.12.0
## [7] tools_3.6.1 backports_1.1.5 R6_2.4.1
## [10] affyio_1.56.0 rpart_4.1-15 Hmisc_4.3-0
## [13] lazyeval_0.2.2 BiocGenerics_0.32.0 colorspace_1.4-1
## [16] nnet_7.3-12 tidyselect_0.2.5 gridExtra_2.3
## [19] bit_1.1-14 compiler_3.6.1 preprocessCore_1.48.0
## [22] Biobase_2.46.0 htmlTable_1.13.3 plotly_4.9.1
## [25] labeling_0.3 bookdown_0.16 checkmate_1.9.4
## [28] scales_1.1.0 affy_1.64.0 stringr_1.4.0
## [31] digest_0.6.23 foreign_0.8-73 rmarkdown_2.0
## [34] jpeg_0.1-8.1 base64enc_0.1-3 pkgconfig_2.0.3
## [37] htmltools_0.4.0 fastmap_1.0.1 limma_3.42.0
## [40] highr_0.8 htmlwidgets_1.5.1 rlang_0.4.2
## [43] rstudioapi_0.10 impute_1.60.0 shiny_1.4.0
## [46] farver_2.0.1 jsonlite_1.6 crosstalk_1.0.0
## [49] mzID_1.24.0 BiocParallel_1.20.0 ropls_1.18.0
## [52] acepack_1.4.1 dplyr_0.8.3 magrittr_1.5
## [55] Formula_1.2-3 Matrix_1.2-18 MALDIquant_1.19.3
## [58] Rcpp_1.0.3 munsell_0.5.0 S4Vectors_0.24.1
## [61] Rhdf5lib_1.8.0 lifecycle_0.1.0 vsn_3.54.0
## [64] stringi_1.4.3 yaml_2.2.0 MASS_7.3-51.4
## [67] zlibbioc_1.32.0 rhdf5_2.30.1 plyr_1.8.5
## [70] grid_3.6.1 parallel_3.6.1 promises_1.1.0
## [73] crayon_1.3.4 lattice_0.20-38 splines_3.6.1
## [76] mzR_2.20.0 zeallot_0.1.0 knitr_1.26
## [79] pillar_1.4.2 codetools_0.2-16 stats4_3.6.1
## [82] XML_3.98-1.20 glue_1.3.1 evaluate_0.14
## [85] latticeExtra_0.6-29 pcaMethods_1.78.0 BiocManager_1.30.10
## [88] png_0.1-7 vctrs_0.2.1 httpuv_1.5.2
## [91] foreach_1.4.7 gtable_0.3.0 purrr_0.3.3
## [94] tidyr_1.0.0 assertthat_0.2.1 ggplot2_3.2.1
## [97] xfun_0.11 mime_0.8 xtable_1.8-4
## [100] later_1.0.0 ncdf4_1.17 survival_3.1-8
## [103] viridisLite_0.3.0 signal_0.7-6 minpack.lm_1.2-1
## [106] tibble_2.1.3 iterators_1.0.12 IRanges_2.20.1
## [109] cluster_2.1.0 ellipsis_0.3.0
Blake, Robert S., Paul S. Monks, and Andrew M. Ellis. 2009. “Proton-Transfer Reaction Mass Spectrometry.” Chemical Reviews 109 (3): 861–96. doi:10.1021/cr800364q.
CLAYTON, C G RYAN, E Clayton, W L Griffin, and S H Sie. n.d. “SNIP, A STATISTICS-SENSITIVE BACKGROUND TREATMENT FOR THE QUANTITATIVE ANALYSIS OF PIXE SPECTFtA IN GEOSCIENCE APPLICATIONS,” 7.
Delabrière, Alexis, Ulli M Hohenester, Benoit Colsch, Christophe Junot, François Fenaille, and Etienne A Thévenot. 2017. “proFIA: A Data Preprocessing Workflow for Flow Injection Analysis Coupled to High-Resolution Mass Spectrometry.” Edited by Oliver Stegle. Bioinformatics 33 (23): 3767–75. doi:10.1093/bioinformatics/btx458.
Devillier, Philippe, Helene Salvator, Emmanuel Naline, Louis-Jean Couderc, and Stanislas Grassin-Delyle. 2017. “Metabolomics in the Diagnosis and Pharmacotherapy of Lung Diseases.” Current Pharmaceutical Design 23 (14). doi:10.2174/1381612823666170130155627.
Holzinger, R. 2015. “PTRwid: A New Widget Tool for Processing PTR-TOF-MS Data.” Atmospheric Measurement Techniques 8 (9): 3903–22. doi:10.5194/amt-8-3903-2015.
Lacy Costello, B de, A Amann, H Al-Kateb, C Flynn, W Filipiak, T Khalid, D Osborne, and N M Ratcliffe. 2014. “A Review of the Volatiles from the Healthy Human Body.” Journal of Breath Research 8 (1): 014001. doi:10.1088/1752-7155/8/1/014001.
Müller, Markus, Tomáš Mikoviny, Werner Jud, Barbara D’Anna, and Armin Wisthaler. 2013. “A New Software Tool for the Analysis of High Resolution PTR-TOF Mass Spectra.” Chemometrics and Intelligent Laboratory Systems 127 (August): 158–65. doi:10.1016/j.chemolab.2013.06.011.
Savitzky, Abraham., and M. J. E. Golay. 1964. “Smoothing and Differentiation of Data by Simplified Least Squares Procedures.” Analytical Chemistry 36 (8): 1627–39. doi:10.1021/ac60214a047.
Vivó-Truyols, Gabriel, and Peter J. Schoenmakers. 2006. “Automatic Selection of Optimal Savitzky−Golay Smoothing.” Analytical Chemistry 78 (13): 4598–4608. doi:10.1021/ac0600196.